Goto

Collaborating Authors

 neural operator


Smooth Piecewise Cutting for Neural Operator to Handle Discontinuities and Sharp Transitions

arXiv.org Machine Learning

Neural operators have achieved strong performance in learning solution operators of partial differential equations (PDEs), but their inherently continuous representations struggle to capture discontinuities and sharp transitions. Existing approaches typically approximate such features within continuous function spaces, often requiring increased model capacity and high-resolution data. In this work, we propose Cut-DeepONet, a two-stage training framework that explicitly models discontinuities while reducing learning complexity. Our approach reformulates the problem via a lifting strategy, partitioning the domain into smooth subregions while representing discontinuities as boundaries in a higher-dimensional space. This separation aligns the operator learning task with the inductive bias of neural networks and avoids directly approximating discontinuities. An additional network predicts input-dependent discontinuity locations for unseen inputs, which are then used to guide the neural operator in generating smooth components within each region. Experiments on benchmark PDEs show that Cut-DeepONet outperforms state-of-the-art methods, even when trained on low-resolution datasets. The method excels on problems with discontinuities and sharp transitions, while using fewer trainable parameters. Our results highlight the benefits of changing the representation of operator learning rather than increasing model complexity.


One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators

arXiv.org Machine Learning

Probabilistic conditioning is concerned with the identification of a distribution of a random variable $X$ given a random variable $Y$. It is a cornerstone of scientific and engineering applications where modeling uncertainty is key. This problem has traditionally been addressed in machine learning by directly learning the conditional distribution of a fixed joint distribution. This paper introduces a novel perspective: we propose to solve the conditioning problem by identifying a single operator that maps any joint density to its conditional, thus amortizing over joint-conditional pairs. We establish that the conditioning operator can be approximated to arbitrary accuracy by neural operators. Our proof relies on new results establishing continuity of the conditioning operator over suitable classes of densities. Finally, we learn the conditioning map for a class of Gaussian mixtures using neural operators, illustrating the promise of our framework. This work provides the theoretical underpinnings for general-purpose, amortized methods for probabilistic conditioning, such as foundation models for Bayesian inference.


HS-FNO: History-Space Fourier Neural Operator for Non-Markovian Partial Differential Equations

arXiv.org Machine Learning

Neural operators provide fast surrogate models for time-dependent partial differential equations, but their standard autoregressive use usually assumes that the instantaneous field $u(t,\cdot)$ is a complete state. This assumption fails for delay equations, distributed-memory systems, and other non-Markovian dynamics: two trajectories may agree at time $t$ and nevertheless have different futures because their histories differ. We introduce the History-Space Fourier Neural Operator (HS-FNO), a neural operator for delay and memory-driven PDEs formulated on the lifted state $u_t(ฮธ,x)=u(t+ฮธ,x)$, $ฮธ\in[-ฯ„,0]$. The key computational step is to decompose one history-state update into a learned predictor for the newly exposed future slice and an exact shift-append transport for the portion of the history window already known from the previous state. This avoids learning deterministic history coordinates, reduces the learned output dimension, and enforces the natural discrete history update. We test HS-FNO on five benchmark families covering delayed reaction--diffusion, spatial epidemiology, nonlocal neural-field dynamics, delayed waves, and distributed-memory closures. Across ten random seeds, HS-FNO attains the lowest aggregate one-step, history-space, and rollout errors among the principal baselines. The largest gain occurs in autoregressive prediction, where aggregate rollout error decreases from $0.241$, $0.188$, and $0.185$ for current-state, lag-stack, and unconstrained history-to-history operators, respectively, to $0.094$. The same model uses fewer parameters than unconstrained history prediction. These results indicate that enforcing the discrete shift structure of history-state evolution is an effective inductive bias for non-Markovian PDE surrogate modeling.


Generalization Error Bounds for Picard-Type Operator Learning in Nonlinear Parabolic PDEs

arXiv.org Machine Learning

Operator learning for partial differential equations (PDEs) aims to learn solution operators on infinite-dimensional function spaces from finite-resolution data. In this setting, it is important for the learned model to be discretization-invariant, or resolution-robust, and to reflect PDE-specific structure. It is therefore natural to ask how such structure should be encoded in the model architecture, hypothesis class, or learning procedure. In this paper, we study operator learning for solution operators of nonlinear parabolic PDEs based on Duhamel--Picard iteration. We formulate Picard iteration as an abstract state-transition model and present a theoretical framework for Picard-type operator learning. We derive implementation-agnostic generalization error bounds that separate the implementation error from the estimation error associated with the abstract state-transition model induced by Picard iteration. A key consequence is that increasing the Picard depth reduces the Picard truncation error without causing an unbounded growth of the entropy-based estimation error. We also extend the analysis to long-time prediction by rolling out the same learned local model over successive time blocks. Finally, we illustrate the theory for nonlinear heat equations on the torus using a Picard-type Fourier neural operator as a concrete implementation.


Convolutional Neural Operators for robust and accurate learning of PDEs

Neural Information Processing Systems

Although very successfully used in conventional machine learning, convolution based neural network architectures - believed to be inconsistent in function space - have been largely ignored in the context of learning solution operators of PDEs. Here, we present novel adaptations for convolutional neural networks to demonstrate that they are indeed able to process functions as inputs and outputs. The resulting architecture, termed as convolutional neural operators (CNOs), is designed specifically to preserve its underlying continuous nature, even when implemented in a discretized form on a computer. We prove a universality theorem to show that CNOs can approximate operators arising in PDEs to desired accuracy. CNOs are tested on a novel suite of benchmarks, encompassing a diverse set of PDEs with possibly multi-scale solutions and are observed to significantly outperform baselines, paving the way for an alternative framework for robust and accurate operator learning.




Operator Learning for Smoothing and Forecasting

arXiv.org Machine Learning

Machine learning has opened new frontiers in purely data-driven algorithms for data assimilation in, and for forecasting of, dynamical systems; the resulting methods are showing some promise. However, in contrast to model-driven algorithms, analysis of these data-driven methods is poorly developed. In this paper we address this issue, developing a theory to underpin data-driven methods to solve smoothing problems arising in data assimilation and forecasting problems. The theoretical framework relies on two key components: (i) establishing the existence of the mapping to be learned; (ii) the properties of the operator learning architecture used to approximate this mapping. By studying these two components in conjunction, we establish novel universal approximation theorems for purely data driven algorithms for both smoothing and forecasting of dynamical systems. We work in the continuous time setting, hence deploying neural operator architectures. The theoretical results are illustrated with experiments studying the Lorenz `63, Lorenz `96 and Kuramoto-Sivashinsky dynamical systems.


Can neural operators always be continuously discretized?

Neural Information Processing Systems

In this work we consider the problem of discretization of neural operators in a general setting. Using category theory, we give a no-go theorem that shows that diffeomorphisms between Hilbert spaces may not admit any continuous approximations by diffeomorphisms on finite spaces, even if the discretization is non-linear. This shows how infinite-dimensional Hilbert spaces and finite-dimensional vector spaces fundamentally differ. A key take-away is that to obtain discretization invariance, considerable effort is needed to ensure that finite-dimensional approximations of neural operator converge not only as sequences of functions, but that their representations converge in a suitable sense as well. With this perspective, we give several positive results. We first show that strongly monotone diffeomorphism operators always admit finite-dimensional strongly monotone diffeomorphisms. Next we show that bilipschitz neural operators may always be written via the repeated alternating composition of strongly monotone neural operators and invertible linear maps. We also show that such operators may be inverted locally via iteration provided that such inverse exists. Finally, we conclude by showing how our framework may be used `out of the box' to prove quantitative approximation results for discretization of neural operators.


Alias-Free Mamba Neural Operator

Neural Information Processing Systems

Benefiting from the booming deep learning techniques, neural operators (NO) are considered as an ideal alternative to break the traditions of solving Partial Differential Equations (PDE) with expensive cost.Yet with the remarkable progress, current solutions concern little on the holistic function features--both global and local information-- during the process of solving PDEs.Besides, a meticulously designed kernel integration to meet desirable performance often suffers from a severe computational burden, such as GNO with $O(N(N-1))$, FNO with $O(NlogN)$, and Transformer-based NO with $O(N^2)$.To counteract the dilemma, we propose a mamba neural operator with $O(N)$ computational complexity, namely MambaNO.Functionally, MambaNO achieves a clever balance between global integration, facilitated by state space model of Mamba that scans the entire function, and local integration, engaged with an alias-free architecture. We prove a property of continuous-discrete equivalence to show the capability ofMambaNO in approximating operators arising from universal PDEs to desired accuracy. MambaNOs are evaluated on a diverse set of benchmarks with possibly multi-scale solutions and set new state-of-the-art scores, yet with fewer parameters and better efficiency.